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Lineage-specific regulatory elements underlie adaptation of species and play 
a role in disease susceptibility. We compared functionally conserved and 
lineage-specific enhancers by cross-mapping 5042 human and 6564 mouse 
heart enhancers. Of these, 79 per cent are lineage-specific, lacking a functional 
orthologue. Heart enhancers tend to cluster and, commonly, there are multiple 
heart enhancers in a heart locus providing a regulatory stability to the locus. 
We observed little cross-clustering, however, between lineage-specific and 
functionally conserved heart enhancers suggesting regulatory function acqui- 
sition and development in loci previously lacking heart activity. We also 
identified 862 human-specific heart enhancers: 417 featuring sequence conser- 
vation with mouse (class II) and 445 with neither sequence nor function 
conservation (class III). Ninety-eight per cent of class III enhancers were 
deleted from the mouse genome, and we estimated a similar-sized enhancer 
gain in the human lineage. Human-specific enhancers display no detectable 
decrease in the negative selection pressure and are strongly associated with 
genes partaking in the heart regulatory programmes. The loss of a heart enhan- 
cer could be compensated by activity of a redundant heart enhancer; however, 
we observed redundancy in only 15 per cent of class II and III enhancer loci 
indicating a large-scale reprogramming of the heart regulatory programme 
in mammals. 
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1. Introduction 

Genome-wide association studies estimate around 85 per cent of disease-causal 
variants residing outside protein-coding DNA [1] and large-scale international 
efforts to map functional elements in the human genome estimate up to 400 000 
regulatory elements in the non-coding part of the human genome [2]. In con- 
trast to genes, the regulatory elements are often lineage-specific [3] and, thus, 
are not amenable to the classical comparative genomics methods [4-6]. Numer- 
ous lineage-specific regulatory elements are a footprint of rapid evolutionary 
changes in human regulomes and represent rapid evolutionary innovation 
underlying the adaptive response of the human lineage. Understanding the 
gain, loss and evolutionary forces acting on human- (and primate-) specific 
regulatory elements is critical for our understanding of the gene regulatory 
impact on the human adaptation and disease. 

As comparative genomics methods [7-9] could not be used for the 
identification of lineage-specific regulatory elements, we can rely only on 
direct enhancer (and silencer) discovery using chromatin immunoprecipitation 
(ChIP) with massively parallel DNA sequencing (ChlP-Seq) targeting open 
chromatin regions (DNasel and similar experiments; [10,11]), bound transcrip- 
tion factors [12], enhancer cofactors P300 and CBP [13,14] and/or specific 
histone modifications [15]. 

In particular, ChlP-Seq experiments targeting the transcriptional co-activator 
P300 have been very accurate in identification of tissue-specific enhancers in the 
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Figure 1. /?^-value between the expression profiles of different mouse heart 
developmental stages with the expression profile of fetal human heart. 

human and mouse genomes [16-19]. In this study, we com- 
pared human and mouse P300 heart enhancers sets [20]. 
We addressed the differences between heart enhancers con- 
served in mammals and heart enhancers specific to the 
human lineage only. We found that conserved heart enhancers 
have greater impact on the expression level of affected genes 
than lineage-specific heart enhancers. Conserved heart enhan- 
cers have higher GC-content, overlap with more CpG islands 
and are enriched in a known histone modification for stronger 
enhancers (H3k9ac) than human-specific heart enhancers. 
We observed a pronounced loss and gain of heart enhan- 
cers, both on the sequence and functional levels. Often, 
redundant/ shadow enhancers prevent the loss of regulatory 
function upon a loss of an enhancer, while the gene expression 
levels display a significant change when the loss corresponds to 
a single heart enhancer in a locus. Our results also indicate that 
strong negative selection constraints active upon acquired, 
lineage-specific enhancers. In summary, our findings provide 
new insights into the lineage-specific regulatory program- 
mes and establish a foundation for studying the regulatory 
diversities between species. 



2. Results 

(a) Only a small fraction of heart enhancers is 
conserved between humans and mice 

We compared the expression profiles of different mouse heart 
developmental stages with the expression profile of fetal 
human heart (figure 1; see §4 for details), suggesting that 
postnatal mouse heart has more similar expression profile 
as fetal human heart than foetal and adult mouse heart. 
Besides, Henderson et al. [21] have previously shown that 
the morphologic stage of heart development for human is 
over at the end of the week 7 of human embryonic develop- 
ment, which matches the birth in mouse. Therefore, we used 
P300 ChlP-Seq foetal human (gestation week 16) and post- 
natal mouse (day 2) heart enhancers [20] — data from the 
two stages that show similar developmental progression 
and gene expression profiles as shown in the earlier studies 
[20,21]. In total, 5042 human heart enhancers and 6564 
mouse heart enhancers were analysed. After cross-mapping 
human and mouse heart enhancers [22], 1066 of them were 
found conserved between the two species, providing an 
estimate of 79 per cent of human heart enhancers being 
lineage-specific, whereas only a minor part of heart enhancers 
conserved within the mammalian branch of the evolutionary 
tree, consistent with previous studies [19,20]. 



(b) Genomic characteristics of conserved and lineage- 
specific heart enhancers 

To study the evolutionary trends of the heart regulatory pro- 
gramme, we classified human P300 heart enhancers into 
shared between humans and mice (dubbed simply shared for 
the rest of the manuscript) and lineage-specific (1066 versus 
3976, respectively; figure 2). Shared heart enhancers are closer 
to transcriptional start sites (TSSs; Student's f-test p-value = 
1.3 X 10~^^) and populate shorter loci (Student's Mest 
pvalue = 7.6 x lO"'^^) when compared with lineage-specific 
heart enhancers. In addition, shared heart enhancers feature 
higher GC-content (Student's Mest pvalue = 2.8 x 10"^^) 
and are more often located in CpG islands (Fisher's exact test 
pvalue = 6.9 x 10"^^). 

(c) Shared and lineage-specific enhancers operate in 
functional clusters 

Next, we performed a stochastic simulation (with 1000 
replicates) to analyse the clustering between shared and 
lineage-specific heart enhancers. In the simulation, we ran- 
domly selected 500 shared and 500 lineage-specific heart 
enhancers and computed the percentage of clustered 
heart enhancers (more than two heart enhancers located in 
the same locus). Almost twice as many shared heart enhan- 
cers are clustered with shared heart enhancers (36%) than 
with lineage-specific heart enhancers (19%; figure 3a; Fisher's 
exact test /:?-value = 1.8 x 10 ~^). More lineage-specific heart 
enhancers are clustered with lineage-specific heart enhancers 
(24%) than shared heart enhancers (16%) as well (figure 3h; 
Fisher's exact test pvalue = 0.002). This suggests the impor- 
tance of regulatory redundancy in loci of genes expressed in 
the heart. Also, the reduced clustering of shared and lineage- 
specific enhancers indicates the separation of ancestral and 
novel regulatory programmes with recent regulatory structures 
targeting and building up within loci that previously lacked 
heart expression. 

(d) Three categories of lineage-specific enhancers 

To study how loss or gain of heart enhancers affects the 
expression of flanking genes and their function in the heart, 
we further partitioned lineage-specific human heart enhancers 
into three classes based on evolutionary sequence conservation 
and the strength of ChlP-Seq signal in their mouse sequence 
orthologues. In particular, we were concerned with possible 
false positive lineage-specific elements associated with strict 
ChlP-Seq cut-offs used to call a region an enhancer. To avoid 
a potential negative impact of experimental uncertainties, we 
defined class I representing equivocal lineage-specific enhan- 
cers (3114 regions) whose sequences were conserved in the 
mouse genome and whose mouse homologous regions did 
not overlap ChlP-Seq peaks, but featured ChlP-Seq read 
count exceeding the genome-wide background level (see §4 
for details). By definition, class I enhancers represent a mix of 
true human-specific heart enhancers and enhancers potentially 
shared in humans and mice that were not assigned to the 
shared class due simply to the low signal in mouse experiment. 
Therefore, all conclusions stemming from the analysis of 
the class I data should be taken with a grain of salt. The remain- 
ing true lineage-specific human heart enhancers (for which 
we were confident no significant mouse heart enhancer 




shared lineage- specific random 




rp ^ ^ <^ <p iP ^ ^ 

^- 



100 



(%) 50 




(d) 



(%) 10 



shared lineage- specific random 




shared 



lineage- specific 



random 



Figure 2. Genomic differences between shared and lineage-specific heart enhancers, {a) Distance to the closest TSS (green, >50 k; red, 10-50 k; blue, <10 k), 
(b) locus length (green, >500 k; red, 200-500 k; blue, <200 k), (c) GC-content (blue denotes shared; red, lineage-specific; green, random), and (d) CpG 
island overlaps. 
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Figure 3. Clustering of heart enhancers with (a) shared and (b) lineage-specific heart enhancers. 



functionality could be detected) were split into the class II (417 
regions) featuring sequence conserved with mouse and class III 
(445 regions) featuring no sequence homology in the mouse 
genome. To prevent indiscriminative results, class I enhancers, 
which could represent weak heart enhancers in mouse and, 
thus, be reclassified as shared enhancers, were mainly excluded 
from the following functional studies. 

(e) Majority of non-conserved human-specific heart 
enhancers were lost in the mouse lineage 

There are two evolutionary possibilities for giving rise to a 
human-specific enhancer with no sequence homology in the 
mouse genome (class III): either a sequence insertion in the 
human lineage or sequence deletion in the mouse lineage. 
To delineate between these two scenarios, we used sequence 
alignments between human and seven distant species form- 
ing an out-group (dog, cat, horse, cow, opossum, chicken 
and frog), to study the evolutionary history of the genomic 
regions in question. Ninety-eight per cent (436/445) of 
these enhancer sequences are present in at least one of the 



seven distant species. This points to the ancestral nature of 
class III enhancers and supports a model of enhancer 
sequence deletion in the mouse lineage for the majority of 
enhancers forming this class. This, in turn, indicates that 
the reshaping of the regulatory genome is largely a result 
on enhancer sequence loss. As there are no fewer mouse 
heart enhancers than human heart enhancers (see §2a for 
raw counts), this enhancer loss should have been compen- 
sated by enhancer gain. Class II (and partially class I) 
enhancers can possibly shed the light on the enhancer gain 
path. Novel human heart enhancers should have been 
formed by heart enhancer function acquisition without 
sequence loss in mouse (class II) or gradual function gain 
with or without sequence divergence (class I). 



(f) Highly expressed heart genes rely on functionally 
conserved enhancers 

To study how different heart enhancers affect the expression 
of flanking genes, we identified 1000 highly expressed heart 
genes (see §4 for details) and calculated the percentage of 
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Figure 4. Percentage of heart enhancers and controls located in the loci of 
genes highly expressed in the heart. Asterisks indicate statistical significance 
Ovalue < 0.05 and ^^>value < 0.001). 

heart enhancers located in the loci of these highly expres- 
sed heart genes (figure 4). Significantly more shared heart 
enhancers are located in the loci of highly expressed 
heart genes than either class II /III human-specific heart enhan- 
cers (Fisher's exact test pvalue = 0.045 and 1.9x10"^, 
respectively) or random expectation (5000 randomly selected 
regions; Fisher's exact test pvalue = 1.7 x 10~^^). This 
suggests elevated levels of negative selection acting on heart 
enhancers located proximally to genes highly expressed in 
the heart, both from the sequence and function selection view- 
point. In addition, more conserved human-specific heart 
enhancers (class II) are located in the loci of highly expressed 
heart genes than human-specific heart enhancers lacking 
sequence similarity with mouse (class III; Fisher's exact test 
pvalue = 0.034). As class III corresponds to sequences that 
have been predominantly lost in the mouse lineage (see §2e), 
these results demonstrate that the decreased strength of selec- 
tion acting on the class III elements and allowing their loss in 
the mouse lineage was probably the result of the reduced 
importance of the class III elements in the developmental 
heart programme, as depicted through their genomic location 
away from genes highly expressed in the heart. 

(g) Shared heart enhancers are associated with 
strong enhancers 

Histone modifications have been shown to play a critical role 
in determining spatio-temporal gene expression patterns in 
vertebrate genomes [15,23] and several histone modifications 
have been established as reliable indicators of gene regulatory 
elements. For this project, we were particularly interested in 
H3K4mel and H3K9ac histone modifications associated 
with enhancers and H3K4me3 associated with promoters 
[24,25]. Besides, we also used DNase I hypersensitive sites 
to study the chromatin accessibility [10]. We analysed the dis- 
tribution of distinct histone modifications around heart 
enhancers that belong to different categories (figure 5). 
Among all histone modifications, H3K9ac shows the most 
significant difference between shared heart enhancers and 
class II /III human-specific heart enhancers (Fisher's exact 
test p-vahie = 2.1 x 10~^^ and 1.2 x 10~^, respectively), with 
shared enhancers demonstrating significantly stronger associ- 
ation with H3K9ac. As H3K9ac is known to be characteristic 
of strong, distant enhancers [25], it is likely that shared 
enhancers are more potent in activating gene expression. In 
addition, elevated DNase I levels in proximity to shared 
heart enhancers indicate higher levels of chromatic 



accessibility to trans-acting factors in the genomic regions 
they occupy (figure 5). 

(h) Human heart enhancers are under negative 
selection pressure 

We used two methods — derived allele frequency (DAF) and 
McDonald -Kreitman test (MK test) — to investigate selective 
constraints, under which heart enhancers evolve. We down- 
loaded human variation data generated by the 1000 
Genomes Project [26] and used the ancestral allele infor- 
mation based on six-way primate alignment from the 
Ensembl compara database [27] to determine the DAF for 
each variant. In addition, we used 16 315 pseudogenes from 
the Pseudogenes.org database [28] as the neutral reference. 
All of shared heart enhancers and class II /III human-specific 
heart enhancers showed a higher fraction of low-frequency 
variants (DAF < 5%) compared with the neutral reference 
(table 1; Fisher's exact test pvalue = 6.8 x 10"^^ 3.1 x 10"^^ 
and 7.4 x 10 respectively), suggesting all classes of 
human heart enhancers are under negative selection pressure. 

We also used MK test to study the selection over human 
heart enhancers. Human polymorphism sites (P; variation 
within species) were contrasted to the non-polymorphic 
human sites different from their chimpanzee (Pan troglodytes) 
counterparts (D; fixed differences or variation between 
species with no inter-species variation). P and D enhancer 
counts (Pg and Dg, respectively) were compared with the cor- 
responding neutral reference counts and calculated 
using pseudogenes. Neutrality index defined as {Pq/D^/ 
(P^/D^) was determined for each type of heart enhancers. 
All of three types of heart enhancers were subject to negative 
selection (neutrality index > 1; table 2), where class II 
human-specific heart enhancers have the largest neutrality 
index, which is consistent with the DAF result (table 1). 
These results are particularly interesting for the class II 
and III heart enhancers, as they indicate no decrease in selec- 
tive pressure on novel heart enhancers (class II) or heart 
enhancers prone to loss in other lineages (class III). 

(i) Biological function of heart genes flanking shared 
heart enhancers 

We performed a gene ontology (GO) analysis to quantify the 
association of heart enhancers with the biological function of 
flanking genes. We selected all cardiac GO terms containing 
'heart', 'cardiac' or 'cardio' in their names for the analysis. 
Totally, 398 cardiac GO terms and 346 genes annotated to 
these categories were obtained. All categories of heart enhan- 
cers display enrichment in cardiac GO terms confirming the 
heart function of genes flanking these enhancers (figure 6). 
When different enhancer categories are compared, shared 
heart enhancers display the strongest enrichment in the 
loci of heart genes (Binomial test p-value = 9.4 x 10~^°), 
again alluding to the important function this category of 
heart enhancers plays in heart development. Notably, 
the least-fold enrichment was observed for the class II 
elements indicating that the gain of regulatory function 
takes place in loci previously not strongly involved into the 
heart regulatory programme. At the same time, the class III 
fold enrichment almost reaches the level of shared enhancers 
suggesting that the loss of the class III enhancer counterparts 
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Figure 5. [a-d) Distribution of histone modifications around heart enhancers. Heart enhancers are contrasted to a random set of 1000 non-coding-conserved 
sequences. Blue denotes shared; red, class II; green, class III; purple, random. 
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Table 1. DAF in heart enhancers versus controls (pseudogenes). 





shared heart enhancers 


class II 


class III 


pseudogenes 


DAF < 5% 


10 640 (74.3%) 


4793 (77.1%) 


5006 (74.7%) 


192 359 (68.4%) 


total 


14311 


6213 


6701 


281 111 


p-value 


6.8x10"^^ 


3.1x10"^^ 


7.4x10"^^ 




Table 2. Neutrality test (MK test) for distinct heart enhancers. 




polymorphism (p) 


fixed difference (D) 


neutrality index 


p-value 


shared heart enhancers 


12 893 


10148 


1.61 


2.1 xlO-'^' 


class II 


5736 


4312 


1.68 


5.8x10"^"^^ 


class III 


6283 


5400 


1.47 


3.5x10"^^ 


pseudogenes 


349 789 


442 395 


1.00 





in the mouse genome had a direct impact on the mouse heart 
regulatory programme. 

(j) Loss of heart enhancers in the mouse lineage 

There are no functional orthologues of class II and class III 
human-specific heart enhancers in the mouse genome, 
which allows us to directly quantify the effects of an enhancer 
loss on the level of gene expression between human and 
mouse. To analyse whether the lack of heart enhancer func- 
tion in the corresponding loci of the mouse genome had a 
footprint on the expression of flanking genes, we investigated 
whether the loss of a heart enhancer in the mouse genome (or 
an enhancer gain in the human genome) was balanced by a 
gain of a new heart enhancer in the same mouse locus 
and /or the effects of the enhancer loss /gain were mitigated 
by other redundant /shadow enhancers in that locus. We 
centred our analysis on the human loci that have ortholo- 
gous counterparts in the mouse genome. Within the mouse 
locus counterparts, 3814 mouse heart enhancers were 



mapped and 2748 of them were mouse-specific (based on 
function conservation, not necessarily sequence conservation). 
In addition, because human-specific heart enhancers can reside 
in the same locus with shared heart enhancers that are suffi- 
cient to maintain the heart regulatory activity in the locus 
(figure 3), we used only singleton heart enhancers (heart 
enhancers without any other heart enhancers located in the 
same locus) for this analysis. In total, 220 singleton shared 
heart enhancers, 101 singleton class II heart enhancers and 
119 singleton class III heart enhancers were used. 

In contrast to the 60 per cent shared heart enhancers that 
contain at least one other mouse-specific heart enhancer in 
the orthologous mouse locus, only 15 per cent of class II 
and 26 per cent of class III enhancers feature the same 
trend (figure 7a). As the class III almost exclusively corre- 
sponds to the loss of enhancers in mouse genome (as 
opposed to an enhancer gain in the primate lineage), these 
results suggest that a loss of a heart enhancer in mouse is 
not balanced by an independent heart enhancer gain in the 
host mouse locus. As singleton heart enhancers represent 
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Figure 6. Cardiac GO-term enrichment in genes flanking heart enhancers. Heart 
enhancers are contrasted to a random set of 1000 non-coding-conserved 
sequences. 
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Figure 7. Clustering of the syntenic mouse loci to singleton human heart 
enhancers with mouse-specific enhancers. The syntenic mouse loci did not 
host a single heart enhancer for the class II and class III heart enhancers. 
{a) Percentage of three distinct human heart enhancers whose loci also con- 
tains at least one mouse-specific heart enhancers in the mouse genome. 
(b) Distance between human heart enhancers and mouse-specific heart 
enhancers when they are clustered in the same loci, (c) The fraction of 
highly expressed genes flanking mouse heart enhancers (blue denotes 
clusters; red, non-clusters). 

only a minority of all heart enhancers, the observed heart 
regulatory activity eradication in many loci with redundant 
enhancers can be compensated by activity of other heart enhan- 
cers in case of a single enhancer loss. However, these results 
identify single enhancer loci open to regulatory reprogram- 
ming, and indicate that the heart activity has been lost or 



gained in many of them. In addition, even though the mouse 
orthologous loci of 15 per cent of class II heart enhancers 
feature additional mouse-specific heart enhancers, the average 
distance between the mouse-specific heart enhancer and 
the mouse orthologue of the human-specific heart enhancers 
is over 200 kb and is fourfold larger than the distance between 
the mouse orthologues of shared human heart enhancers and 
their neighbouring mouse-specific heart enhancers (figure 7h). 
As class II represents the primary set for heart enhancers that 
have been gained in the human lineage, this suggests that this 
is a case of convergent evolution with independent heart enhan- 
cer acquisition in human and mouse genomes might stiU differ 
in the regulatory mechanisms associated with the acquired 
heart regulatory activity. 

The ultimate impact of the regulatory change following the 
loss and gain of heart enhancers is the change in the heart 
expression levels of genes flanking the affected enhancers. 
In particular, we were interested in learning if (i) the loss of a 
human enhancer in the mouse lineage leads to a decreased 
level of heart expression and (ii) the presence of another 
mouse-specific enhancer in proximity to the site of loss could 
mitigate the impact on the expression level. To validate this 
observation, we studied the expression level of genes flanking 
these singleton human-specific heart enhancers in the mouse 
genome. Indeed, we observed a significant decrease in the 
fraction of highly expressed mouse heart genes flanking non- 
clustered (those that do not have an additional mouse enhancer 
in the locus) mouse counterparts of class II (binomial test 
pvalue = 0.03) and class III (binomial test ;7-value = 0.09) 
human enhancers when compared with mouse orthologues 
of shared enhancers (figure 7c). The presence of another 
mouse enhancer in the locus (clustered) mitigates the impact 
on the gene expression level upon a loss of a heart enhancer 
(figure 7c), but the qualitative observations did not reach stat- 
istical significance in our analysis owing to the low fraction of 
class II and class III mouse counterparts featuring another 
mouse heart enhancer in the locus (figure 7a). These results 
demonstrate that the loss and gain of singleton heart enhancers 
in the locus has a pronounced impact on the regulatory activity 
of the corresponding genes. 



3. Discussion 

In this study, we identified 1066 heart enhancers shared by 
humans and mice, and 3976 lineage-specific human heart 
enhancers. By comparing the distribution of distinct genome 
features between shared and lineage-specific heart enhancers 
in the human genome, we found that lineage-specific heart 
enhancers are more distant to transcription start sites and are 
located in longer loci than shared enhancers suggesting that 
even though plenty of distal regulatory elements are discov- 
ered in the vertebrate genomes [29,30], the most important 
regulatory elements are located closer to the affected genes 
[31] and proximal regulatory elements have higher impact to 
the affected genes than distal regulatory elements [32]. In 
addition, we found that shared heart enhancers have higher 
GC-content and overlap with more CpG islands, which is con- 
sistent with results showing that functional conserved 
elements always have higher GC-content and overlap with 
CpG islands [33]. 

Clustering analysis of shared and lineage-specific 
human heart enhancers as shown in figure 3 indicates that 
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lineage-specific human heart enhancers tend to cluster with 
lineage-specific human heart enhancers more than shared 
human heart enhancers. Furthermore, we found that sequence 
orthologues of human-specific heart enhancers are rarely clus- 
tered with mouse-specific heart enhancers indicating loss of 
regulatory function in the mouse genome with no functional 
compensation by redundant enhancers in the majority of 
the cases. Expression analysis for the flanking genes bet- 
ween shared and human-specific heart enhancers shows that 
shared heart enhancers have a greater impact on the expression 
level of affected genes than human-specific heart enhancers. 
A functional footprint study also shows that enrichment of 
the H3k9ac mark associated with strong enhancer is almost 
twofold greater in shared heart enhancers than human-specific 
heart enhancers, suggesting that the enhancer activity can be 
higher in shared heart enhancers than human-specific heart 
enhancers. GO analysis also indicates that more genes affected 
by shared heart enhancers are related to heart function than 
human-specific heart enhancers and no particular cardiac GO 
term was only enriched in the human-specific heart enhancers. 
All of these results suggest that even though human-specific 
heart enhancers are lost in the mouse genome and no counter- 
parts exist in the mouse genome to recover the affected genes, 
these affected genes are not as important to the heart functions 
as those affected by shared heart enhancers. 

While the DNA sequences of non-conserved human- 
specific heart enhancers were predominantly deleted in the 
mouse genome, investigation of selective constrains indicates 
that these sequences are still evolving under negative selection 
pressure in humans. In addition, we found that genes affected 
by non-conserved human-specific heart enhancers are more 
related to heart functions than conserved human-specific 
heart enhancers based on GO analysis and more counterparts 
in the mouse genome are in the same loci as the non-conserved 
human-specific heart enhancers than conserved human- 
specific heart enhancers, suggesting that non-conserved 
human-specific heart enhancers still play an important role in 
heart function. 

In summary, this study provides new insights into the 
evolution and functional role of lineage-specific regulatory 
elements in mammals. 

4. Material and methods 

(a) Comparison between different mouse heart 
developmental stages with fetal human heart 

Expression profiles of three different mouse heart developmen- 
tal stages, i.e. foetal (E11.5; GSE1479), postnatal (one week; 
GSE38754) and adult (nine months; GSE41810), were compared 
with the expression profile of fetal human heart (GSE1789). 
Expression profile of human genes were mapped into the mouse 
genome using Homologene (http://www.ncbi.nlm.nih.gov/ 
homologene) and i^^-value was calculated for each expression pro- 
file of different mouse heart developmental stage to study the 
correlation of the expression profiles between foetal human heart 
and three different mouse heart developmental stages. 

(b) Shared and lineage-specific heart enhancers 

P300 ChlP-Seq human heart enhancers (gestational week 16; 5042 
sequences) and mouse heart enhancers (postnatal day 2; 
6564 sequences) were downloaded from May et al. [20]. Mouse 
heart enhancers were mapped into the human genome using the 
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Figure 8. Determination of locus length for intergenic/intronic heart 
enhancers. 

liftOver tool [22] and resulted in 6230 orthologous human 
sequences. Overlapping human heart enhancers and ortholo- 
gous sequences of mouse heart enhancers were termed shared 
heart enhancers (1066 sequences). Other human heart enhancers 
were treated as lineage-specific heart enhancers (3976 sequences). 

(c) Locus length 

Locus length for intergenic/intronic heart enhancers was deter- 
mined separately (figure 8). For an intergenic heart enhancer, 
the locus encompasses two flanking genes and the intergenic 
region separating them; for an intronic heart enhancer, the 
locus comprises the host gene and the two intergenic intervals 
flanking it. 

(d) Lineage-specific heart enhancers were partitioned 
into three classes 

All lineage-specific human heart enhancers were mapped into 
the mouse genome using liftover [22] and resulted in 3531 
orthologous mouse sequences. After that, lineage-specific heart 
enhancers were separated into three classes based on their ortho- 
logous sequences in the mouse genome. Class III consisted of 445 
lineage-specific heart enhancers that cannot be mapped into the 
mouse genome — non-conserved, human-specific heart enhan- 
cers. Out of 3531 orthologous sequences in the mouse genome, 
3114 had higher ChlP-Seq read densities compared with the 
average ChlP-Seq read densities for all mouse-human evolutio- 
narily conserved regions (ECRs; [34]) and were considered as 
equivocal heart enhancers (class I). Other 417 orthologous 
sequences whose ChlP-Seq read densities did not exceed the 
average ChlP-Seq read densities for all ECRs were considered 
conserved human-specific heart enhancers (class II). 

(e) Top 1000 highly expressed human and mouse 
heart genes 

Gene expression data in foetal human heart were downloaded 
from the GEO database (GSE1789), and average expression 
levels across five normal foetal heart samples were calculated. 
In addition, gene expression data for 10 human non-heart tissues 
in the HG-U133_Plus_2.tissue-mixture-data-set were down- 
loaded from Affymetrix (http://www.affymetrix.com) and 
average expression levels for these 10 human non-heart tissues 
were calculated. For both of these two expression datasets, a 
small arbitrary number (16) was added to each expression 
value to avoid misleading readings for low expression values. 
Log-transformation was performed for the expression level of 
each gene, and ratio of expression level in the heart tissue to 
that in the non-heart tissue was calculated. The top 1000 genes 
with the highest ratio were selected as the highly expressed 
human heart genes. 

For the mouse genome, gene expression data for one-week- 
old mouse in the heart tissue were downloaded from the GEO 
database (GSE38754), and average gene expression levels across 
five samples were calculated. Besides, gene expression data for 
90 mouse non-heart tissues were downloaded from the GEO 



database (GSE10246). Normalization similar to the human 
genome was performed to get the top 1000 highly expressed 
mouse heart genes. 

(f) Estimation of selection pressure using 1000 genome 
project data 

SNP data generated by the 1000 Genomes Project [26] were used to 
study selective constrains for distinct classes of heart enhancers. 
Two methods (DAF and MK test) were used to estimate selection 
pressure. To determine DAF for each SNP, we downloaded the 
ancestral allele information based on a six-way primate alignment 
from the Ensembl compara database [27]. SNP sites for each class 
of heart enhancers were determined and DAF for these SNP sites 
were counted. To analyse the selective pressure on human heart 
enhancers, we used 5 per cent as the cut-off threshold for DAF 
because variants with DAF > 5% are defined as common variants 
and SNPs with DAF < 5% are often referred to as low-frequency 
variants [35]. Fisher's exact test was used to determine the signifi- 
cance of DAF for each class of heart enhancers compared with the 
neutral reference (pseudogenes). 

The MK test was also used to estimate selective constrains 
over human heart enhancers. The MK test compares the differ- 
ence between polymorphism (P), i.e. variation within species 
and fixed difference (D), i.e. variation between species but not 
within species, and studies the fixed rate of variants. Polymorph- 
ism (P) was estimated by the sites of SNP across heart enhancers 
and fixed difference (D) was determined by the difference 
between the number of nucleotide differences between human 
and chimpanzee (d) and the heterozygous sites across heart 
enhancers (tt), i.e. D = d-7T. The ratio between polymorphism 
and fixed difference was calculated for each class of heart enhan- 
cers (Pe/De) and was compared with the ratio of the neutral 
reference (Pn/Dn). The Neutrality index is defined as {Pq/D^/ 
(Pn/On) and if the neutrality index is greater than 1 
{Pe/De > Pn/Dn), this indicates that human heart enhancers 
have been subject to negative selection. Otherwise, if the neu- 
trality index is less than 1 (Pg/Dg <C Pn/Dn), human heart 
enhancers have been subject to positive selection. Fisher's exact 
test was also used to estimate the significance of MK-test. 



(g) Gene ontology analysis 

To determine whether distinct classes of heart enhancers are 
overrepresented near some particular classes of heart genes, the 
closest genes for each class of heart enhancers were used as 
the test dataset (T), and all genes near all heart enhancers were 
treated as the background dataset (B). Fisher's exact test was per- 
formed to calculate the p-value for each GO term and the 
Bonferroni multiple-testing correction was used to determine 
whether a GO term is significantly enriched in the test dataset. 
In addition, to determine whether distinct classes of heart enhan- 
cers are enriched in different heart genes and are associated with 
different heart function, only GO terms containing 'heart', 'car- 
diac' or 'cardio' in their names and genes associated with these 
GO terms were used in our analysis. Totally, 398 GO terms 
and 346 genes were used. 

(h) Clustering of human-specific heart enhancers and 
mouse-specific heart enhancers 

To study whether human-specific heart enhancers are balanced 
by mouse-specific heart enhancers so that even though human- 
specific heart enhancers were lost in the mouse genome their 
flanking genes are still activated by the compensated mouse- 
specific heart enhancers, first, we identified singleton human 
heart enhancers, which were not clustered with any other 
human heart enhancers in the same loci, for each class of 
human heart enhancers. In total, 220 shared heart enhancers, 101 
conserved human-specific heart enhancers (class II) and 119 
non-conserved human-specific heart enhancers (class III) were sin- 
gleton human heart enhancers. After that, all human-specific heart 
enhancers were mapped into the mouse genome using liftOver 
[22] and the percentage of singleton human heart enhancers 
whose orthologous loci in the mouse genome contained at least 
one mouse-specific heart enhancers was calculated for each class 
of human heart enhancers. 
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